What is claimed is: 



1. A method of normalizing genetic data for n loci, wherein n is an integer 
greater than one, comprising 

5 (a) obtaining genetic data comprising n sets of first and second signal values 

related in a coordinate system, wherein said first and second signal values 
are indicative of the levels of a first and second allele, respectively, at n 
loci; 

(b) identifying a set of sweep points in said coordinate system; 
10 (c) identifying a set of control points, said control points comprising at least 

a subset of said signal values that are proximal to said sweep points; 

(d) determining parameters of a registration transformation equation based 
on said set of control points; and 

(e) transforming said n sets of first and second signal values according to 
15 said registration transformation equation and said parameters, thereby 

normalizing said genetic data. 

2. The method of claim 1, wherein said genetic data is represented in a 
graphical format. 

20 

3. The method of claim 2, wherein said graphical format comprises Cartesian 
coordinates. 

4. The method of claim 1, wherein said genetic data is provided in a tabular 

25 format. 

5. The method of claim 1, wherein n is at least 2. 

6. The method of claim 1, wherein said identifying sweep points comprises 
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(i) identifying an upper limit on a line or curve through said coordinate system; 
and 

(ii) locating said sweep points between the origin of each axis and said upper 
limit. 

5 

7. The method of claim 6, wherein said upper limit has a value in a first 
dimension that is greater than or equal to the first dimension of any of said signal 
values. 

10 8. The method of claim 6, further comprising a step of identifying a lower 

limit on said line or curve, and wherein said locating comprises locating said sweep 
points between said lower limit and said upper limit. 

9. The method of claim 1, wherein said identifying a set of control points 
1 5 comprises triangulation using pairs of signal values and a sweep point. 

10. The method of claim 9, wherein said triangulation comprises Delaunay 
triangulation. 

20 11. The method of claim 1, wherein said identifying a set of control points 

comprises computing all pair-wise distances between the signal values and each sweep 
point. 

12. The method of claim 1, wherein said determining parameters of a 
25 registration transformation equation comprises projecting said control points to a line or 

curve passing through said sweep points, thereby forming set points. 

13. The method of claim 12, wherein said registration transformation 
equation comprises affine transformation projecting said control points onto said set 

30 points. 
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14. The method of claim 12, wherein said registration transformation 
equation comprises linear conformational transformation projecting said control points 
onto said set points. 

5 

15. The method of claim 12, wherein said registration transformation 
equation comprises projective transformation projecting said control points onto said set 
points. 

10 16. The method of claim 12, wherein said registration transformation 

equation comprises polynomial transformation projecting said control points onto said 
set points. 

17. The method of claim 1, wherein said determining parameters of a 
1 5 registration transformation equation comprises global registration. 

18. The method of claim 1, wherein said set of control points is fewer in 
number compared to the number of first and second signal values. 

20 19. The method of claim 1, wherein said sweep points are located on a line 

or curve through said coordinate system when represented graphically. 

20. The method of claim 19, wherein said line comprises an axis of said 
coordinate system. 

25 

21. The method of claim 1, wherein said sweep points are spaced along said 
line or curve in a manner selected from the group consisting of linear, log-linear and 
non-linear. 
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22. The method of claim 1, wherein said coordinate system comprises two 
dimensions. 

23. The method of claim 22, wherein step (b) comprises identifying two sets 
5 of sweep points in said coordinate system; and step (c) comprises identifying two sets of 

control points. 

24. The method of claim 1, wherein said genetic data comprises n sets of 
first, second and third signal values related in a coordinate system, wherein said first, 

10 second and third signal values are indicative of the levels of a first, second and third 
allele, respectively, at n loci. 

25. The method of claim 24, wherein said coordinate system comprises three 
dimensions. 

15 

26. The method of claim 24, wherein step (b) comprises identifying three 
sets of sweep points in said coordinate system; and step (c) comprises identifying three 
sets of control points. 

20 27. The method of claim 1, wherein said registration transformation is 

selected from the group consisting of rotation of said n sets of first and second signal 
values, translation of said n sets of first and second signal values, scaling of said n sets 
of first and second signal values, and sheer of said n sets of first and second signal 
values. 

25 

28. The method of claim 1, further comprising a step of balancing said n sets 
of first and second signal values by a signal transformation, thereby balancing the 
probability function for the distribution of said n sets of first and second signal values as 
a function of signal intensity. 

30 



-41- 



29. The method of claim 1, wherein said signal transformation is selected 
from the group consisting of natural logarithm, base 2 logarithm, base 10 logarithm, 
arctangent, square root, nth root, wherein n > 2, and Box-Cox. 



5 30. A method of clustering genetic data for n loci, wherein n is an integer 

greater than one, comprising 

(a) obtaining genetic data comprising n sets of first and second signal values 
related in a coordinate system, wherein said first* and second signal values 
are indicative of the levels of a first and second allele, respectively, at n 

10 loci; 

(b) comparing fit of said genetic data to each of a plurality of cluster models 
using an artificial neural network, thereby determining a best fit cluster 
model; and 

(c) assigning said signal values to at least one cluster according to said best 
15 fit cluster model. 



31. The method of claim 30, wherein if said best fit cluster model contains at 
least one actual cluster and at least one missing cluster, then using a second artificial 
neural network to propose a location for said at least one missing cluster. 

20 

32. The method of claim 30, wherein said plurality of cluster models 
comprises at least seven cluster models. 

33. The method of claim 30, further comprising training said first or second 
25 artificial neural network with an algorithm selected from the group consisting of a 

genetic algorithm, back-propagation algorithm, Levenberg-Marquardt algorithm and 
Bayesian algorithm. 
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34. The method of claim 30, wherein said first or second artificial neural 
network comprises a 3 layer feed-forward artificial neural network or a two layer 
artificial neural network. 

35. The method of claim 30, wherein said coordinate system comprises two 
dimensions. 

36. The method of claim 35, wherein step (c) comprises assigning said signal 
values to at least one cluster according to said best fit cluster model, wherein if said 
best fit cluster model contains at least one actual cluster and fewer than three actual 
clusters, then using a second artificial neural network to propose a location for at least 
one missing cluster, wherein the sum of actual and missing clusters is three. 

37. The method of claim 36, wherein if said best fit cluster model contains 
one actual cluster, then using said second artificial neural network to propose a location 
for two missing clusters. 

38. The method of claim 37, further comprising separately training said 
artificial neural network for proposing locations for said two missing clusters. 

39. The method of claim 36, wherein if said best fit cluster model contains 
two actual clusters, then using said second artificial neural network to propose a 
location for one missing clusters. 

40. The method of claim 30, wherein said genetic data comprises n sets of 
first, second and third signal values related in a coordinate system, wherein said first, 
second and third signal values are indicative of the levels of a first, second and third 
allele, respectively, at n loci. 
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41 . The method of claim 40, wherein said coordinate system comprises three 
dimensions. 

42. The method of claim 41, wherein step (c) comprises assigning said signal 
5 values to at least one cluster according to said best fit cluster model, wherein if said best 

fit cluster model contains at least one actual cluster and fewer than six actual clusters, 
then using a second artificial neural network to propose a location for at least one 
missing cluster, wherein the sum of actual and missing clusters is six. 

10 43. The method of claim 30, wherein said genetic data is represented in a 

graphical format. 

44. The method of claim 43, wherein said graphical format comprises 
Cartesian coordinates. 

15 

45. The method of claim 30, wherein said genetic data is provided in a 
tabular format. 



46. The method of claim 30, wherein n is at least 2. 



20 



47. A genotyping system, comprising 

(a) an array reader configured to detect signals from separate locations on an 
array substrate; 

(b) a computer processor configured to receive signal values from said array 
25 reader; 

(c) a normalization module comprising commands for 

(i) reading said signal values; 

(ii) identifying a set of sweep points for said signal values in a 
coordinate system; 
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(iii) identifying a set of control points, said control points comprising 
at least a subset of said signal values that are proximal to said 
sweep points; 

(iv) determining parameters of a registration transformation equation 
based on said control points; and 

(v) transforming said signal values according to said registration 
transformation equation and said parameters, thereby providing 
normalized genetic data; and 

a clustering module comprising commands for 

(i) reading said normalized genetic data; 

(ii) comparing fit of said normalized genetic data to each of a 
plurality of cluster models using an artificial neural network, 
thereby determining a best fit cluster model; and 

(iii) assigning said signal values to at least one cluster according to 
said best fit cluster model, wherein if said best fit cluster model 
contains at least one actual cluster and at least one missing cluster, 
then using a second artificial neural network to propose a location 
for said at least one missing cluster. 

A method of determining a genotype score, comprising 

obtaining genetic data comprising n sets of first and second signal values 

related in a coordinate system, wherein said first and second signal values 

are indicative of the levels of a first and second allele, respectively, at n 

loci; 

identifying a set of sweep points in said coordinate system; 
identifying a set of control points, said control points comprising at least 
a subset of said signal values that are proximal to said sweep points; 
determining parameters of a registration transformation equation based 
on said set of control points; and 
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transforming said n sets of first and second signal values according to 
said registration transformation equation and said parameters, thereby 
normalizing said genetic data; 

comparing fit of said normalized genetic data to each of a plurality of 
cluster models using an artificial neural network, thereby determining a 
best fit cluster model; 

assigning said signal values to at least one cluster according to said best 
fit cluster model, wherein if said best fit cluster model contains at least 
one actual cluster and at least one missing cluster, then using a second 
artificial neural network to propose a location for said at least one 
missing cluster; and 

determining, for an individual, the alleles present at said n loci. 
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